Add CONSERVATIVE_WHITESPACE_ON_MINIFYING to retain inline text spaces #148
+100
−24
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I was looking into the issue outlined in #121 and #21 as it's affecting me. Essentially the current system of completely removing whitespace in a text flow has some bugs and it's not easy to fix. So, in an effort to at least get things working I've added a
CONSERVATIVE_WHITESPACE_ON_MINIFYING
option to turn off eagerly removing whitespace and always leave at least 1 space (I'm also hoping that feature may be useful to someone else for other purposes). The default is to continue working as before for backwards compatibility.The slightly longer explanation
If you try minifying
a <i>b</i>
it will properly reduce toa <i>b</i>
and the display output in a browser will be "a b" (note the space). However, if you try minifyinga<i> b</i>
it will reduce toa<i>b</i>
and then get displayed as "ab" (note, no space). Ideally, the 2nd case should end up beinga <i>b<i>
to retain that inline space but at the right spot.Now, I tried going through the code to see if I could fix that 2nd case, but I'm not sure I see a way to get there without some major restructuring. One issue is that when examining the
NavigableString
of' b'
inis_inflow
, theprevious_sibling
isNone
(in the above example). BS4 doesn't treat a tag next to a text block as siblings.I did find another project that seemed to do a good job of this here: https://github.com/kangax/html-minifier/blob/51ce10f4daedb1de483ffbcccecc41be1c873da2/src/htmlminifier.js#L65-L83 However, that's in JS using completely different libraries and would require quite a bit of time to rework with BS4. That project also had a "conservative whitespace" option and I thought it might be a good feature to add and also provide a workaround for this issue until a better solution is developed.
One last point... Any solution essentially assumes that inline/inline-block HTML elements will remain such, but CSS could easily break things if someone were to do something like
<div style="display:inline">
. Obviously that's a silly thing to do, but this change allows for things like that to work as expected. Sometimes HTML on the page comes from third party libraries and it's not always possible to fix these things to have proper HTML.